home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
PC World 2007 November
/
PCWorld_2007-11_cd.bin
/
v cisle
/
winuae
/
InstallWinUAE1440.exe
/
Docs
/
README.umisef
< prev
next >
Wrap
Text File
|
2000-12-14
|
14KB
|
269 lines
You found it --- this tells you how to use the JIT compiler.
First things first: This is not an explanation on how to use UAE under
linux, or how to use UAE in general. There is other documentation about
that, and if you are not familiar with it, PLEASE read it! This document
only ever mentions stuff specific to the JIT compiler versions!
More disclaimers: This stuff is still definitely in a pre-Beta stage.
It works for me, and I am trying to make sure it works for others, too,
but it is impossible to test even a significant portion of the possible
configurations.
Most of my testing is done on 8 bit screens. Other bit depths *should*
work, but have seen little to no testing.
Things might crash at any time, and in interesting ways, and while you
can curse me for it, that's the worst you can do. No liability whatsoever!
There are two executables, uae_Xwin and uae_DGA2. Normally, you will want
to use uae_Xwin, it is the much more mature and less experimental one.
Both will connect to the X display in your DISPLAY environment variable,
bring up the GTK Gui (unless you disable it in the config file), and start
the emulation.
If you have a working configuration file for UAE/linux, then you can
use it as-is with these executables. However, be aware that the default
settings for the new configuration options are extremely conservative,
and to get best performance, you should really change them (see below).
[Update: This is no longer true. The default settings are now pretty
much optimal, and you probably won't have any reason to change them!]
If you don't have a working configuration file, each executable comes with
a sample config file. Of course, you'll have to change a lot of options,
because your setup and mine are different, but it is a start. I recommend,
however, that you first get and install pristine UAE 0.8.15, and make
sure you *do* have *that* working correctly. UAE-JIT will have no chance
whatsoever of working correctly otherwise.
There are several new options in the config file. PLease take the time
to read through this, so you know what you are dealing with!
comptrustbyte:
comptrustword:
comptrustlong: Possible values are direct, indirect, indirectKS,
and afterPic.
***** These options are obsolete now! Leave them at their *****
***** default value of "direct" unless you really, really *****
***** have a good reason for changing them! *****
These describe how aggressive to be when it comes to accessing
Amiga memory. If you choose "direct", the emulation will be
very aggressive. If you choose "indirect", the emulation will
always use the slower but safe method. "indirectKS" will
use the aggressive method for all code except Kickstart code,
and "afterPic" uses the safe method until the first time
a Picasso96 mode is switched on, and the aggressive method
from then on.
I usually use "afterPic" for all of them; If this fails
(you get a core dump and UAE exits suddenly --- for me that
happens when starting SysInfo or GeneticSpecies2), it usually
is enough to set comptrustbyte to "indirect". Defaults are
"indirect" for all three.
Unless you are not using P96 graphics (why not?), there isn't
much point setting this to "direct". During the startup, weird
and wonderful things happen in the Amiga, and only having faith
in the aggressive method once that difficult time is over is
certainly a wise thing to do.
comptrustnaddr: Same as above.
***** This option is obsolete now! Leave it at its *****
***** default value of "direct" unless you really, really *****
***** have a good reason for changing it! *****
I have yet to find any software that can't handle
"afterPic", and I'd be very surprised if there is
any. If you find something that works with "indirect",
but not with "afterPic", please tell me!
compnf: "yes" or "no". Whether to optimize away flag generation when
it isn't needed. There really shouldn't be any reason why
you'd want to set this to "no"; If you find something that
works with "no" and doesn't with "yes", that's a bug and
I need to know about it! The reverse is a bug, too, but
hopefully I squashed that one before the release ;-)
cachesize: The size (in kb) the JIT compiler uses to store pretranslated
code. When this becomes full, or when the OS issues a
flush icache instruction, this gets completely emptied, and
then refilled during execution. Setting it to 0 will
disable the JIT compiler.
comp_flushmode: *NEW* "hard" or "soft". If this is set to soft (the default),
an OS induced icache flush doesn't actually empty the
cache, but instead checksumming will be used to check whether
blocks have to be discarded. You'll probably want to leave this
at its default (otherwise lots of stuff, like the OS, gets
translated over and over).
comp_constjump: *NEW* If this is "yes" (the default), unconditional branches
will not end a block; Effectively, UAE-JIT compiles "through"
them. Generally, that's a good idea, as it improves performance.
However, it makes soft cache flushing impossible for some blocks,
so if you experience lots and lots of soft cache flushes (e.g.
when using a Mac emulator), you might try "no" and see whether it
does any better.
compfpu: If this is "yes" (the default), the JIT compiler will
be used for the most commonly used FPU instructions. Setting
it to "no" will disable JIT-compiling for the FPU.
[Note: The "unroll" option is no longer supported. You should remove it
from your config files if it's still in there]
[Note2: Setting some of those options to sub-optimal values will cause
UAE-JIT to exit with a message pointing at README.JIT-tuning]
================= All of the above can be set from the GTK GUI, too ===========
================= The options below are one-time, config-file only ============
avoid_cmov: "yes" or "no". If you have a processor that doesn't support
the P6-class CMOV instructions, you have to set this to "yes".
The JIT compiler will then not try to translate any
instructions for which it would generate code with CMOV
in it. Better slower than "illegal instruction", right ;-)
avoid_dga: If you use the Xwin executable, setting this to "yes" will
stop it from even looking for the DGA extension. Obviously,
it won't use it, either.
avoid_vid: If you use the Xwin executable, setting this to "yes" will
stop it from even looking for the Vidmode extension. Obviously,
it won't use it, either.
[Note: the following options are not available in the "sanitized" versions
of UAE-JIT. The executables made available on byron@csse.monash.edu.au
are not sanitized, but if you compile your own from the patches,
you need to include the "extra options" patch to get these. And don't
take my use of the plural in this paragraph to mean anything --- it
is generic ;-) ]
override_dga_address: If you use the DGA2 executable, this will allow you
to override the linear frame buffer address DGA2 detects.
Try it first without this, but if you just get a blank grey
screen (and F12-S gets you a window with the right content),
your XServer might get it wrong (seems fairly common, in fact).
Find out the linear frame buffer address (preferably by looking
at /proc/nnnnn/maps, with nnnnn the pid of the X server --- look
for a mapping of /dev/mem with the right size; The offset of that
mapping is the value you are looking for).
In this option, you provide the *upper 16 bits* of that address.
So if your linear frame buffer is at 0xd5000000, you set
override_dga_address to 0xd500. Yes, the config file will take
hex numbers.
============================ End of Options =============================
Many of these options can be changed through the GTK UI. However, as many
of them influence code *generation*, changes will only take effect when
code is newly translated; The already translated code in the cache is
uneffected.
In order to make your changes take effect, you need to force a hard cache
flush. The easiest way to do so is to change the cache size by some small
amount. Remember this step if you try to benchmark the result of various
option settings on performance, otherwise results will be rather
inconclusive ;-)
How to get the maximum performance:
-----------------------------------
Here are a few tips on how to get the best possible performance, and to
avoid common pitfalls.
* Use a 2.3.*, or even better a 2.4test* kernel. Without it, you might
not be able to do aggressive memory modes (see README.JIT-tuning)
* The really aggressive memory modes use sysv_shm. By default, the
largest sysv_shm block you can allocate at one time is 32M, so
if you have a larger Z3Mem, allocation will fail and the aggressive
modes get disabled.
You can change the max size through /proc/sys/kernel/shmmax, the first
parameter is the max size.
* Use Picasso96 modes!
* Use DGA for your actual display! (If you don't, you CANNOT make any
comments about sluggish gfx performance. Understood?)
* Alternatively, use CGX3 with direct access to an S3 Virge PCI card
(see README.pci)
* Set as many of the comptrust* options as possible as aggressively as
you can without creating a crash [*** obsolete ***]
* For the adventurous: If you use the DGA2 executable with an XFree86 4.0x
server, AND select a Picasso video mode that has the same width as your
X virtual screen[1], AND haven't done anything else to prevent you from
using aggressive memory access (like setting comptrust* to indirect),
you *should* end up with vastly faster gfxmem access. This is still
buggy, occasional display corruption when using blits occurs. But
for seeing how fast Doom can go on an "Amiga", this is the ticket ;-)
* If your app comes in versions for different CPUs, try all of them.
I have had good experiences with using 040 versions, particularly
of RC5 (use "-c 2" to select the 040 core). Of course, this only
works if the 040 apps don't use 040-specific features, or if you
have enabled 040 support for UAE
Feedback:
=========
I need to know about remarkable experiences you have, but I really
don't need to know about unremarkable things. Here is a little guide
as to what is what:
Remarkable:
* Something that works with the compiler disabled, but fails with
it enabled
* Any occurrence of "illegal instruction" (from Linux) on a P6 class
machine, or a P5 class machine with avoid_cmov=yes
* Any failure to boot with a config file that does boot "normal"
UAE/linux
* Anything else that you can clearly identify as an emulation bug,
rather than as a configuration, hardware or user problem
* Any patches you can come up with
* Any offers of sponsorship for further work on it ;-)
Unremarkable:
* Any failures attributable to memory shortage
* Any problems you might have with linux, UAE or the Amiga in general,
not specific to the JIT compiler version
* Any statements to the effect that I am a traitor, a lamer, a wannabe,
a loser, a demigod, a guru, a procrastinator, or anything else along
those lines
* Any non-constructive criticism of my coding style. Remember: The only
valid form of criticism is a patch! (Of course, certain people
are excepted from this, notably everyone who would be involved
with integrating this code into other UAE versions ;-)
If you think you found something remarkable, PLEASE let me know. And
please describe the circumstances as precisely as possible --- only if
I can recreate the fault can I have a real shot at figuring out what
went wrong.
Good luck, and looking forward to your feedback,
Bernie (bmeyer@csse.monash.edu.au)
P.S.: There is some output to stdout/stderr while running (and some
directly to the tty). The lines that pop up every second have
a number of fields. Here are short explanations of each:
* compiled: The total number of bytes the compiled code (and
the related bookkepping information) takes up
* soft: Number of soft cache flushes done in the last second
* hard: Number of hard cache flushes done in the last second
* trans: Number of 68k blocks translated in the last second
* check: Number of 68k blocks that had their checksums check in
the last second as a result of a soft cache flush
* lost: Time "lost" during the last second of emulation time.
This should be 0, but if the emulation can't keep up for
some reason (like file I/O happening), it can be larger.
The output is in seconds; Keep an eye on this if you
do self-timed benchmarks!
* debug/2/3/4: Internal counters I use for debugging. If you have
software that can make debug3 and/or debug4 reach more
than 100,000, please tell me about it --- these are counting
non-compiled FPU instructions executed.
[1] In reality, things are even more complex --- what you need to match
is the pitch of the mode. Normally, that matches the virtualwidth,
but my Trident 3DImage975 uses a pitch of 1024 for a 640 wide mode....